memory layer

Terms from Artificial Intelligence: humans at the heart of algorithms

Page numbers are for draft copy at present; they will be replaced with correct numbers when final book is formatted. Chapter numbers are correct and will not change now.

Memory layers act as a form of key-value lookup. They work alongside normal feed-forward layers to make large-language models more computationally efficient, but could be applied to other forms of neural network. Memory layers use sparsely-connected networks followed by a top-K layer/rule that means only the K nodes with highest activation feed forward their results. As with memorisation techniques, they are fast to compute, but relatuely heavy on memory, but there are ways to implement it efficiently on parallel hardware. Note too that the top-K part can be viewed as a form of lateral inhibition, as the nodes effectively compete to be able to feed forward.